我们介绍了钢筋学习的部分可观察的历史过程(POHP)形式主义。PoHP中心周围的单一代理的行动和观察以及摘要其他玩家的存在,而不将它们减少到随机过程中。我们的形式主义提供了一种简化的界面,用于设计算法,用于独家单个或多代理的分类,以及用于在这些域中应用的发展理论。我们展示了PoHP形式主义如何统一传统模型,包括马尔可夫决策过程,马尔可夫游戏,广泛的形式游戏和他们的部分可观察到的扩展,而不会引入繁琐的技术机械或违反加固学习的哲学支撑。我们通过简明地探索可观察的连续合理性,重新导出广泛形式的遗憾最小化(EFR)算法,并检查EFR在更大的理论特性的情况下进行广泛的形式的效用。
translated by 谷歌翻译
Min-Max优化问题(即,最大游戏)一直在吸引大量的注意力,因为它们适用于各种机器学习问题。虽然最近取得了重大进展,但迄今为止的文献已经专注于独立战略集的比赛;难以解决与依赖策略集的游戏的知识,可以被称为Min-Max Stackelberg游戏。我们介绍了两种一阶方法,解决了大类凸凹MIN-Max Stackelberg游戏,并表明我们的方法会聚在多项式时间。 Min-Max Stackelberg游戏首先由Wald研究,在Wald的Maximin模型的Posthumous名称下,一个变体是强大的优化中使用的主要范式,这意味着我们的方法同样可以解决许多凸起的稳健优化问题。我们观察到Fisher市场中竞争均衡的计算还包括Min-Max Stackelberg游戏。此外,我们通过在不同的公用事业结构中计算Fisher市场的竞争性均衡来证明我们的算法在实践中的功效和效率。我们的实验表明潜在的方法来扩展我们的理论结果,通过展示不同的平滑性能如何影响我们算法的收敛速度。
translated by 谷歌翻译
事后观察合理性是一种玩一般游戏的方法,该游戏规定了针对一组偏差的单个代理的无重格学习动态,并进一步描述了具有介导的平衡的多个代理商之间的共同理性行为。为了在依次的决策设置中发展事后理性学习,我们将行为偏差形式化为一般偏差,尊重广泛形式游戏的结构。将时间选择的概念整合到反事实遗憾的最小化(CFR)中,我们介绍了广泛的遗憾最小化(EFR)算法,该算法对于任何给定的行为偏差都具有与集合的复杂性紧密相关的计算相关的行为偏差。我们识别行为偏差子集,部分序列偏差类型,这些类型还包含先前研究的类型并导致长度中等的游戏中有效的EFR实例。此外,我们对基准游戏中不同偏差类型实例化的EFR进行了彻底的经验分析,我们发现更强大的类型通常会引起更好的性能。
translated by 谷歌翻译
在最近在两人,零和游戏中取得成功的驱动下,人工智能在游戏中的工作越来越重视产生基于平衡策略的算法。但是,这种方法在培养通用游戏或两个以上玩家的能力的玩家中的效果较小,而不是在两人游戏中的零和零游戏中。一个有吸引力的替代方法是考虑自适应算法,以确保相对于修改行为可以实现的方面的强劲表现。这种方法还导致了游戏理论分析,但是在关节学习动力学而不是均衡的代理行为引起的相关性游戏中。我们在一般的顺序决策环境中发展并倡导这一对学习的事后理性理性框架。为此,我们在广泛的游戏中重新检查了介导的平衡和偏差类型,从而获得了更完整的理解和解决过去的误解。我们提出了一组示例,说明了文献中每种平衡的独特优势和劣势,并证明没有可牵引的概念可以包含所有其他概念。这一探究线在与反事实遗憾最小化(CFR)家族中算法相对应的偏差和平衡类的定义中达到顶点,将它们与文献中的所有其他人联系起来。更详细地研究CFR进一步导致相关游戏中合理性的新递归定义,该定义以自然适用于后代评估的方式扩展了顺序合理性。
translated by 谷歌翻译
Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods for language models like few-shot learning and fine-tuning still require human supervision and transfer learning using self-learning methods has been underexplored. We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information with both source and target tasks from weights to tokens. We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for generalizability for online transfer. Contrastive distillation is improved through sampling from memory and suggests a simple algorithm for more efficiently sampling negative examples for contrastive losses than random sampling.
translated by 谷歌翻译
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.
translated by 谷歌翻译
Due to the low signal-to-noise ratio and limited resolution of functional MRI data, and the high complexity of natural images, reconstructing a visual stimulus from human brain fMRI measurements is a challenging task. In this work, we propose a novel approach for this task, which we call Cortex2Image, to decode visual stimuli with high semantic fidelity and rich fine-grained detail. In particular, we train a surface-based convolutional network model that maps from brain response to semantic image features first (Cortex2Semantic). We then combine this model with a high-quality image generator (Instance-Conditioned GAN) to train another mapping from brain response to fine-grained image features using a variational approach (Cortex2Detail). Image reconstructions obtained by our proposed method achieve state-of-the-art semantic fidelity, while yielding good fine-grained similarity with the ground-truth stimulus. Our code is available at: https://github.com/zijin-gu/meshconv-decoding.git.
translated by 谷歌翻译
Breast cancer is the second most common type of cancer in women in Canada and the United States, representing over 25% of all new female cancer cases. Neoadjuvant chemotherapy treatment has recently risen in usage as it may result in a patient having a pathologic complete response (pCR), and it can shrink inoperable breast cancer tumors prior to surgery so that the tumor becomes operable, but it is difficult to predict a patient's pathologic response to neoadjuvant chemotherapy. In this paper, we investigate the efficacy of leveraging learnt volumetric deep features from a newly introduced magnetic resonance imaging (MRI) modality called synthetic correlated diffusion imaging (CDI$^s$) for the purpose of pCR prediction. More specifically, we leverage a volumetric convolutional neural network to learn volumetric deep radiomic features from a pre-treatment cohort and construct a predictor based on the learnt features using the post-treatment response. As the first study to explore the utility of CDI$^s$ within a deep learning perspective for clinical decision support, we evaluated the proposed approach using the ACRIN-6698 study against those learnt using gold-standard imaging modalities, and found that the proposed approach can provide enhanced pCR prediction performance and thus may be a useful tool to aid oncologists in improving recommendation of treatment of patients. Subsequently, this approach to leverage volumetric deep radiomic features (which we name Cancer-Net BCa) can be further extended to other applications of CDI$^s$ in the cancer domain to further improve prediction performance.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
使用手动生成标签训练的卷积神经网络通常用于语义或实例分割。在精确的农业中,自动花探测方法使用监督模型和后处理技术,这些技术可能不会始终如一地表现为花朵的出现,并且数据采集条件有所不同。我们提出了一种自我监督的学习策略,以使用自动生成的伪标签来增强分割模型对不同花种物种的敏感性。我们采用数据增强和完善方法来提高模型预测的准确性。然后将增强的语义预测转换为全景伪标签,以迭代训练多任务模型。可以通过现有的后处理方法来完善自我监督的模型预测,以进一步提高其准确性。对多物种果树花数据集的评估表明,我们的方法的表现优于最先进的模型,而无需计算昂贵的后处理步骤,为花朵检测应用提供了新的基线。
translated by 谷歌翻译